Ivanova Scalable Scientific Stream Query Processing

نویسنده

  • Milena Ivanova
چکیده

Ivanova, M. 2005. Scalable Scientific Stream Query Processing. Acta Universitatis Upsaliensis. Uppsala Dissertations from the Faculty of Science and Technology 66. 137 pp. Uppsala. ISBN 91-554-6351-7 Scientific applications require processing of high-volume on-line streams of numerical data from instruments and simulations. In order to extract information and detect interesting patterns in these streams scientists need to perform on-line analyses including advanced and often expensive numerical computations. We present an extensible data stream management system, GSDM (Grid Stream Data Manager) that supports scalable and flexible continuous queries (CQs) on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are specified as high-level data flow distribution templates. A built-in template library provides several common distribution patterns from which complex distribution patterns are constructed. Using a generic template we define two customizable partitioning strategies for scalable parallel execution of expensive stream queries: window split and window distribute. Window split provides parallel execution of expensive query functions by reducing the size of stream data units using application dependent functions as parameters. By contrast, window distribute provides customized distribution of entire data units without reducing their size. We evaluate these strategies for a typical high volume scientific stream application and show that window split is favorable when expensive queries are executed on limited resources, while window distribution is better otherwise. Profile-based optimization automatically generates optimized plans for a class of expensive query functions. We further investigate requirements for GSDM in Grid environments. GSDM is a fully functional system for parallel processing of continuous stream queries. GSDM includes components such as a continuous query engine based on a data-driven data flow paradigm, a compiler of CQ specifications into distributed execution plans, stream interfaces and communication primitives. Our experiments with real scientific streams on a shared-nothing architecture show the importance of both efficient processing and communication for efficient and scalable distributed stream processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Customizable Parallel Execution of Scientific Stream Queries

Scientific applications require processing highvolume on-line streams of numerical data from instruments and simulations. We present an extensible stream database system that allows scalable and flexible continuous queries on such streams. Application dependent streams and query functions are defined through an object-relational model. Distributed execution plans for continuous queries are desc...

متن کامل

High-performance GRID Database Manager for Scientific Data

The GRID initiative provides an infrastructure for distributed computations among widely distributed high-performance computers. This will allow for exchanging and processing very large amounts of data. The LOFAR project (www.nfra.nl/lofar) is an international initiative to build a versatile, geographically distributed, multi-point radio facility for astrophysics, space physics, atmospheric phy...

متن کامل

Leveraging Distributed Publish/Subscribe Systems for Scalable Stream Query Processing

Existing distributed publish/subscribe systems (DPSS) offer loosely coupled and easy to deploy content-based stream delivery services to a large number of users. However, the lack of query expressiveness limits their application scope. On the other hand, distributed stream processing engines (DSPE) provide efficient processing services for complex stream queries. Nevertheless, these systems are...

متن کامل

Astronomical Data Processing Using SciQL, an SQL Based Query Language for Array Data

SciQL (pronounced as ‘cycle’) is a novel SQL-based array query language for scientific applications with both tables and arrays as first class citizens. SciQL lowers the entrance fee of adopting relational DBMS (RDBMS) in scientific domains, because it includes functionality often only found in mathematics software packages. In this paper, we demonstrate the usefulness of SciQL for astronomical...

متن کامل

Distributed inference and query processing for RFID tracking and monitoring

In this paper, we present the design of a scalable, distributed stream processing system for RFID tracking and monitoring. Since RFID data lacks containment and location information that is key to query processing, we propose to combine location and containment inference with stream query processing in a single architecture, with inference as an enabling mechanism for high-level query processin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005